Quantitative metagenomic analyses based on average genome size normalization.
نویسندگان
چکیده
Over the past quarter-century, microbiologists have used DNA sequence information to aid in the characterization of microbial communities. During the last decade, this has expanded from single genes to microbial community genomics, or metagenomics, in which the gene content of an environment can provide not just a census of the community members but direct information on metabolic capabilities and potential interactions among community members. Here we introduce a method for the quantitative characterization and comparison of microbial communities based on the normalization of metagenomic data by estimating average genome sizes. This normalization can relieve comparative biases introduced by differences in community structure, number of sequencing reads, and sequencing read lengths between different metagenomes. We demonstrate the utility of this approach by comparing metagenomes from two different marine sources using both conventional small-subunit (SSU) rRNA gene analyses and our quantitative method to calculate the proportion of genomes in each sample that are capable of a particular metabolic trait. With both environments, to determine what proportion of each community they make up and how differences in environment affect their abundances, we characterize three different types of autotrophic organisms: aerobic, photosynthetic carbon fixers (the Cyanobacteria); anaerobic, photosynthetic carbon fixers (the Chlorobi); and anaerobic, nonphotosynthetic carbon fixers (the Desulfobacteraceae). These analyses demonstrate how genome proportionality compares to SSU rRNA gene relative abundance and how factors such as average genome size and SSU rRNA gene copy number affect sampling probability and therefore both types of community analysis.
منابع مشابه
The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes
Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome rel...
متن کاملAnalyzing genome coverage profiles with applications to quality control in metagenomics
MOTIVATION Genome coverage, the number of sequencing reads mapped to a position in a genome, is an insightful indicator of irregularities within sequencing experiments. While the average genome coverage is frequently used within algorithms in computational genomics, the complete information available in coverage profiles (i.e. histograms over all coverages) is currently not exploited to its ful...
متن کاملSize Does Matter: Application-driven Approaches for Soil Metagenomics.
Metagenomic analyses can provide extensive information on the structure, composition, and predicted gene functions of diverse environmental microbial assemblages. Each environment presents its own unique challenges to metagenomic investigation and requires a specifically designed approach to accommodate physicochemical and biotic factors unique to each environment that can pose technical hurdle...
متن کاملParallel-META 2.0: Enhanced Metagenomic Data Analysis with Functional Annotation, High Performance Computing and Advanced Visualization
The metagenomic method directly sequences and analyses genome information from microbial communities. The main computational tasks for metagenomic analyses include taxonomical and functional structure analysis for all genomes in a microbial community (also referred to as a metagenomic sample). With the advancement of Next Generation Sequencing (NGS) techniques, the number of metagenomic samples...
متن کاملReference-guided Assembly of Metagenomic Sequences
Metagenomic studies have primarily relied on de novo approaches for reconstructing genes and genomes from microbial mixtures. While database driven approaches have been employed in certain analyses, they have not been used in the assembly of metagenomic data. This is in part due to the small size and biased coverage of public genome databases, but also due to the inherent computational cost of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Applied and environmental microbiology
دوره 77 7 شماره
صفحات -
تاریخ انتشار 2011